Holistic Approach for Classifying and Retrieving Personal Arabic Handwritten Documents

نویسنده

  • SALAMA BROOK
چکیده

This paper presents a novel holistic technique for classifying and retrieving Arabic handwritten text documents. The retrieval of Arabic handwritten documents is performed in several steps. First, the Arabic handwritten document images are segmented into words, and then each word is segmented into its connected parts. Second, several features are extracted from these connected parts and then combined to represent a word with one consolidated feature vector. Finally, a generalized feedforward neural network is used to learn and classify the different styles/fonts into word classes, which are used to retrieve Arabic handwritten text documents. Key-Words: Data mining of Arabic text, Word recognition, Arabic handwriting, Segmentation of Arabic handwritten documents, Feature extraction, Classification, and Retrieval of Arabic handwritten documents.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification of Personal Arabic Handwritten Documents

This paper presents a novel holistic technique for classifying Arabic handwritten text documents. The classification of Arabic handwritten documents is performed in several steps. First, the Arabic handwritten document images are segmented into words, and then each word is segmented into its connected parts. Second, several structural and statistical features are extracted from these connected ...

متن کامل

Neural Network Based Segmentation Algorithm for Arabic Characters Recognition

This paper presents a novel holistic technique for classifying Arabic handwritten text documents, which it is performed in several steps. First, the Arabic handwritten document images are segmented into their connected parts. A simple heuristic segmentation algorithm is used which finds segmentation points in printed and cursive handwritten words. Second, several features are extracted from the...

متن کامل

W-TSV: Weighted topological signature vector for lexicon reduction in handwritten Arabic documents

This paper proposes a holistic lexicon-reduction method for ancient and modern handwritten Arabic documents. The word shape is represented by the weighted topological signature vector (W-TSV), which encodes graph data into a low-dimensional vector space. Three directed acyclic graph (DAG) representations are proposed for Arabic word shapes, based on topological and geometrical features. Lexicon...

متن کامل

Methods of the Arabic Manuscripts Digitization

1 The authors acknowledge Saint-Petersburg State University for a research grant 2.37.175.2014. Abstract The mediaeval Arabic manuscripts are not only valuable artifacts but they also represent one of the major sources of scholar information in the field of Oriental Studies. This paper discusses the methods of Arabic Manuscripts Digitization. Over the last fifteen years a lot of Arabic manuscri...

متن کامل

Separation of Overlapping and Touching Lines within Handwritten Arabic Documents

In this paper, we propose an approach for the separation of overlapping and touching lines within handwritten Arabic documents. Our approach is based on the morphology analysis of the terminal letters of Arabic words. Starting from 4 categories of possible endings, we use the angular variance to follow the connection and separate the endings. The proposed separation scheme has been evaluated on...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008